Skip to content

feat: Dynamic Open Graph meta tags for social media link previews#1251

Merged
kevinschaper merged 7 commits intomainfrom
issue-1242-link-previews
Feb 10, 2026
Merged

feat: Dynamic Open Graph meta tags for social media link previews#1251
kevinschaper merged 7 commits intomainfrom
issue-1242-link-previews

Conversation

@kevinschaper
Copy link
Member

Summary

Enables entity-specific link previews when sharing Monarch URLs on social media platforms (Slack, Twitter, Discord, Mastodon, Bluesky, etc.).

Closes #1242

How It Works

Architecture

Request for /MONDO:0005148
         │
         ▼
      Nginx
         │
         ├─── Is User-Agent a bot? ───► Yes ──► Proxy to /v3/api/meta/MONDO:0005148
         │                                              │
         │                                              ▼
         │                                      FastAPI fetches entity from Solr,
         │                                      returns HTML with dynamic OG tags
         │
         └─── No (regular user) ──► Serve index.html (SPA loads normally)

Nginx Bot Detection

Nginx uses a map directive to detect crawler User-Agents and routes them to the meta endpoint:

map $http_user_agent $is_bot {
    default 0;
    ~*Slackbot 1;
    ~*Twitterbot 1;
    ~*facebookexternalhit 1;
    ~*Discordbot 1;
    ~*Mastodon 1;
    ~*Bluesky 1;
    # ... and more
}

Only CURIE-format paths (e.g., /MONDO:0005148) are eligible - regular pages like /about are unaffected.

Meta Endpoint

The FastAPI endpoint fetches the entity from Solr and renders HTML with OG tags:

@router.get("/meta/{entity_id:path}", response_class=HTMLResponse)
async def get_meta_page(entity_id: str, request: Request) -> HTMLResponse:
    entity = solr().get_entity(entity_id, extra=False)
    
    # Build title and description from entity data
    title = f"{entity.name} | Monarch Initiative"
    description = f"{entity.name} - {entity.description}"
    
    # Derive URL from request headers (works for beta and prod)
    base_url = get_base_url(request)
    
    return template.render(title=title, description=description, ...)

Dynamic URL Support

URLs are derived from request headers, so the same deployment works for both environments:

  • beta.monarchinitiative.org → OG tags reference beta URLs
  • monarchinitiative.org → OG tags reference production URLs

Supported Platforms

Slackbot, Twitterbot, Facebook, LinkedIn, Discord, WhatsApp, Telegram, Mastodon, Bluesky, Googlebot, Bingbot, Embedly, Pinterest, Applebot

Files Changed

File Purpose
backend/src/monarch_py/api/meta.py FastAPI endpoint
backend/src/monarch_py/api/templates/meta.html Jinja2 template
services/nginx/config/default.conf Bot detection routing
backend/tests/api/test_meta.py Unit tests
docs/link-previews.md Documentation

Enable entity-specific link previews on Slack, Twitter, Discord, Mastodon,
Bluesky, and other social platforms by serving dynamic OG tags to crawler bots.

Closes #1242
@netlify
Copy link

netlify bot commented Jan 30, 2026

Deploy Preview for monarch-app canceled.

Name Link
🔨 Latest commit bfadb8a
🔍 Latest deploy log https://app.netlify.com/projects/monarch-app/deploys/698bb1841c34030008a6e1aa

@claude
Copy link

claude bot commented Jan 30, 2026

PR Review: Dynamic Open Graph Meta Tags for Social Media Link Previews

Summary

This PR implements a solid architecture for serving dynamic Open Graph tags to social media crawlers while maintaining the SPA experience for regular users. The implementation is clean and well-documented.

✅ Strengths

  1. Excellent Architecture: The nginx-based bot detection approach is efficient and keeps the SPA fast for users
  2. Comprehensive Documentation: The docs/link-previews.md and PR description are excellent
  3. Good Test Coverage: Both backend unit tests and nginx integration tests included
  4. Security Conscious: Jinja2 autoescape is enabled, protecting against XSS
  5. Environment Agnostic: Dynamic URL derivation works seamlessly across beta/prod

🔍 Code Quality & Best Practices

backend/src/monarch_py/api/meta.py:56-58

Issue: Broad exception catching loses important error context. The current code catches all exceptions and returns 404, which masks genuine errors like database connection failures. Consider catching specific exceptions or re-raising non-404 errors as 500s.

backend/src/monarch_py/api/meta.py:16-20

Observation: Jinja environment is initialized at module level. If TEMPLATES_DIR does not exist, the app will fail at import time rather than startup. Consider validating TEMPLATES_DIR exists during initialization.

services/nginx/config/default.conf:26

Minor: CURIE regex could be more restrictive. The pattern matches paths like /_:--- which are not valid CURIEs. However, if your CURIEs use lowercase/underscores, the current pattern is fine.

🔒 Security

XSS Protection: Jinja2 autoescape properly enabled
Injection Attacks: No SQL/command injection vectors
⚠️ Open Redirect (Low Risk): Meta refresh redirect is low risk due to nginx regex + Solr validation
Header Trust: X-Forwarded-Proto and Host headers are properly set by nginx

⚡ Performance

Good: Jinja environment cached at module level

Suggestion: Add HTTP cache headers since entity data changes infrequently:

headers={"Cache-Control": "public, max-age=3600"}

🧪 Test Coverage

Good Coverage: Happy path, 404 handling, XSS protection

Missing Test Cases:

  1. Entity with special characters in name/description
  2. Entity with very long description (truncation logic)
  3. Entity with missing name/description fields
  4. Response headers validation (Cache-Control)
  5. X-Forwarded-Proto header handling
  6. Non-entity paths in nginx tests

🎯 Recommendations

High Priority: None - production-ready as-is ✅

Medium Priority:

  1. Error handling: Distinguish 404s from 500s
  2. Performance: Add HTTP cache headers
  3. Test coverage: Add missing test cases

Low Priority:

  1. Stricter CURIE regex (if applicable)
  2. Validate TEMPLATES_DIR on startup

✅ Final Verdict

Approved with minor suggestions. Well-designed, secure, and well-tested. The nginx-based routing is excellent. Suggestions are optimizations, not blockers.

Great work! 🎉

@claude
Copy link

claude bot commented Feb 9, 2026

Pull Request Review: Dynamic Open Graph Meta Tags

This is a well-architected implementation of dynamic social media previews for entity pages. The approach is clean, well-tested, and follows best practices. Here's my detailed feedback:


Strengths

  1. Excellent Architecture: The nginx-level bot detection is efficient and avoids impacting regular users. Routing bots to a separate endpoint while serving the SPA normally is the right approach.

  2. Strong Security: XSS protection via Jinja2's autoescape is properly configured and tested (test_meta_endpoint_escapes_html_in_content).

  3. Good Test Coverage:

    • Unit tests cover success, 404, and XSS scenarios
    • Integration test script covers 8 different bot user agents
    • Both backend and nginx layers are tested
  4. Environment-Agnostic: Dynamic URL derivation from request headers means no config changes between beta/prod.

  5. Clear Documentation: docs/link-previews.md provides excellent overview and testing instructions.


🔍 Code Quality Observations

backend/src/monarch_py/api/meta.py

Good:

  • Clean separation of concerns with helper functions
  • Proper error handling and logging
  • Good fallback behavior for missing entity data

Minor Suggestions:

  1. Redundant None check (lines 60-61): The exception handler at line 56-58 already catches failures, so the explicit if entity is None check is redundant unless get_entity can return None without raising an exception. Consider clarifying this behavior or removing the redundant check.

  2. Description truncation edge case (lines 80-82): When truncating at 300 characters, you might cut in the middle of a word. Consider using a word boundary:

    if len(description) > max_description_length:
        description = description[:max_description_length].rsplit(' ', 1)[0] + "..."
  3. Magic number: The 300 character limit should be a constant at the module level:

    MAX_DESCRIPTION_LENGTH = 300

services/nginx/config/default.conf

Concerns:

  1. Regex pattern may be too permissive (line 26): The pattern ~^/[A-Za-z_]+:[A-Za-z0-9_.-]+$ allows underscores in the prefix and dots/hyphens in the ID. While this may be intentional, verify this matches all valid CURIEs and doesn't match invalid ones.

  2. Missing proxy headers in bot proxy (lines 54-57): When proxying bot requests to the meta endpoint, you're not setting the same proxy headers that are set for API requests (lines 110-113). This means X-Forwarded-Proto and other headers won't be passed through. While get_base_url() checks these headers, they won't be present.

    Recommendation: Add proxy headers to the bot proxy block:

    if ($serve_meta = "BE") {
        rewrite ^/(.*)$ /v3/api/meta/$1 break;
        proxy_pass http://api;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
  3. Nginx if anti-pattern: The nested if statements (lines 46-57) are generally discouraged in nginx. While the workaround using string concatenation is clever, this could be cleaner with a map directive. However, given the current implementation works, this is a low-priority refactor.

backend/src/monarch_py/api/templates/meta.html

Concerns:

  1. Meta refresh security (line 28): The meta http-equiv="refresh" could potentially be used for open redirect attacks if the url variable is ever user-controlled. While it's currently derived from the entity ID and request headers, consider:
    • Adding URL validation in get_meta_page()
    • Or using a JavaScript redirect instead
    • Or removing the redirect entirely (bots don't follow it anyway)

backend/tests/api/test_meta.py

Good:

  • Tests cover the important scenarios
  • XSS test is excellent

Suggestions:

  1. Missing test cases:

    • Test with very long descriptions to verify truncation logic
    • Test with entity that has no description
    • Test with entity that has no name
    • Test that the returned URL matches the request host/scheme
  2. Test could be more specific (line 27): Instead of assert "MONDO:0020121" in html or "muscular dystrophy" in html.lower(), test both the entity ID and name separately for clarity.


Performance Considerations

  1. No caching: Each bot request triggers a Solr query. For popular entities shared frequently, consider adding response caching:

    • HTTP cache headers (Cache-Control, ETag)
    • Or application-level caching of entity metadata
  2. Template compilation: The Jinja2 template is compiled on every request. Consider caching the compiled template (though this may already be handled by Jinja2's internal caching).


🛡️ Security Considerations

  1. ✅ XSS protection: Properly implemented with Jinja2 autoescape and tested.

  2. ✅ No SQL injection: Using the Solr client properly.

  3. ⚠️ Open redirect potential: The meta refresh tag could theoretically be exploited if URL construction is ever changed. Add validation to ensure the URL is always on the same domain.

  4. ⚠️ SSRF consideration: get_base_url() uses X-Forwarded-Proto and Host headers from the request. If nginx isn't properly configured to strip/override these headers from untrusted sources, an attacker could manipulate the OG URLs. Verify that nginx is the only source setting these headers.


📝 Minor Issues

  1. Inconsistent string formatting: The code mixes f-strings and .format(). Stick to f-strings throughout for consistency.

  2. Missing type hints: Consider adding return type hint to get_base_url() and get_default_image() (you already have them, good!).

  3. Logging could be more detailed (line 57): Consider logging the exception type and traceback at debug level for easier troubleshooting.


🎯 Recommendations

Must Fix:

  1. Add proxy headers to the nginx bot proxy configuration (critical for correct URL derivation)

Should Fix:
2. Add URL validation to prevent potential open redirect
3. Verify CURIE regex pattern matches expected entities only
4. Add test coverage for edge cases (long descriptions, missing fields)

Nice to Have:
5. Add HTTP caching headers for bot responses
6. Extract magic number to constant
7. Improve description truncation to avoid cutting mid-word


📊 Overall Assessment

Code Quality: ⭐⭐⭐⭐ (4/5)
Security: ⭐⭐⭐⭐ (4/5)
Test Coverage: ⭐⭐⭐⭐ (4/5)
Documentation: ⭐⭐⭐⭐⭐ (5/5)

This is solid work! The main concern is the missing proxy headers in nginx, which could cause incorrect URLs in certain deployment scenarios. Once that's addressed and a few edge case tests are added, this would be ready to merge.

Great job on the comprehensive testing and documentation! 🎉

@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.51%. Comparing base (6b15e4c) to head (bfadb8a).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1251      +/-   ##
==========================================
+ Coverage   72.62%   73.51%   +0.88%     
==========================================
  Files          93       95       +2     
  Lines        3306     3417     +111     
==========================================
+ Hits         2401     2512     +111     
  Misses        905      905              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Add proxy_set_header directives to nginx bot proxy block for correct URL generation
- Improve error handling: ValueError -> 404, unexpected errors -> 500
- Extract MAX_DESCRIPTION_LENGTH constant with word-boundary truncation
- Add Cache-Control header (public, max-age=3600) to meta responses
- Strengthen test assertions and add coverage for edge cases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 9, 2026

Pull Request Review: Dynamic Open Graph Meta Tags for Social Media Link Previews

Overall Assessment

This is a well-implemented feature with excellent documentation, comprehensive test coverage, and thoughtful design. The solution elegantly solves the SPA meta tag problem by detecting bots at the nginx layer and serving dynamic content from a FastAPI endpoint.

✅ Strengths

  1. Excellent Architecture - Clean separation of concerns, minimal performance impact, environment-agnostic URL handling
  2. Strong Security - Proper XSS prevention via Jinja2 autoescape, good error handling
  3. Comprehensive Testing - 8 unit tests covering edge cases plus integration tests
  4. Great Documentation - Detailed docs with architecture diagrams
  5. Robust Error Handling - Differentiates between 404 and 500 errors appropriately

🔍 Issues and Recommendations

1. Nginx Config: Potential Performance Issue (Medium Priority)

Location: services/nginx/config/default.conf:46-61

The use of if directives to combine conditions is not ideal in nginx. The current string concatenation approach works but is unconventional. While acceptable for now, consider using nested locations or a map that combines both conditions. See: https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/

2. Description Truncation Edge Case (Low Priority)

Location: meta.py:85-86

If the description has no spaces, rsplit(" ", 1) returns the entire string, which could exceed MAX_DESCRIPTION_LENGTH + 3. Consider ensuring truncated strings never exceed the limit even without word boundaries.

3. Missing Rate Limiting (Medium Priority)

Location: meta.py:37

The endpoint has no rate limiting. Consider adding it at nginx level or via FastAPI middleware to prevent aggressive bot crawling.

4. CURIE Regex Pattern Limitation (Low Priority)

Location: services/nginx/config/default.conf:26

The pattern ~^/[A-Za-z_]+:[A-Za-z0-9_.-]+$ doesn't allow numbers in prefix. Consider documenting expected CURIE format or using a more permissive pattern.

5. Missing Image Validation (Low Priority)

Location: meta.py:32-34

The default OG image path /share-thumbnail.jpg is hardcoded without verification. Consider adding a comment documenting the expected location.

💡 Future Enhancements

  • Entity-specific images per entity type
  • Monitoring/metrics for bot requests
  • Structured data (JSON-LD) for better SEO
  • Vary: User-Agent header for proper caching

🧪 Test Coverage

Excellent coverage with 8 unit tests. Minor gaps:

  • No test for get_base_url() with different header combinations
  • No test for very long entity IDs
  • No test for URL encoding edge cases

📋 Summary

Recommendation: APPROVE with minor suggestions

This is production-ready code demonstrating strong security awareness, good engineering practices, and excellent documentation. The identified issues are low-priority items that can be addressed in follow-up PRs if needed.

Great work on this feature! The architecture is clean and the implementation is solid.

Copy link
Member

@ptgolden ptgolden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. My only feedback is the potential removal of that try/catch block and the nginx tests.

In an ideal world, we would have a slightly different architecture where this would not be necessary, and going to an entity page would just serve the meta tags properly no matter what. (i.e. the entity could be prefetched and included in the response as JSON so that the SPA could pick it up without having to immediately make a request). But that would be a much larger and more disruptive change, and it's not in the cards.

@kevinschaper you mentioned removing the meta endpoint from API documentation. I don't have an opinion on it either way. I don't think it hurts to leave it in, but I understand wanting to keep it "private".

kevinschaper and others added 4 commits February 10, 2026 12:36
Remove unnecessary try/catch — get_entity returns None (doesn't raise),
and FastAPI handles uncaught exceptions as 500s already. Matches the
pattern in entity.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These don't fit the project's testing framework and add maintenance
burden for shell code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This is an internal endpoint for bot crawlers, not part of the public API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 10, 2026

PR Review: Dynamic Open Graph Meta Tags for Social Media Link Previews

Summary

This is a well-implemented feature that addresses a real need for better social media link previews. The architecture is clean, the code is readable, and the test coverage is excellent. The PR includes comprehensive documentation and handles edge cases thoughtfully.

Strengths

1. Clean Architecture

The nginx-based bot detection + FastAPI backend approach is elegant and performant:

  • Bots get dynamic meta tags without hitting the SPA
  • Regular users get the normal SPA experience
  • No JavaScript required for crawlers
  • Environment-agnostic URL derivation from request headers

2. Excellent Test Coverage

The test suite (backend/tests/api/test_meta.py) is comprehensive with 8 tests covering:

  • Basic functionality, 404 handling, XSS protection (HTML escaping)
  • Long description truncation, Missing name/description edge cases, Cache headers

3. Security Conscious

  • Jinja2 autoescape enabled for XSS protection (line 18 in meta.py)
  • Proper HTML escaping tested (test_meta.py:45-61)
  • Cache-Control headers set appropriately (1 hour)

4. Good Documentation

The docs/link-previews.md file provides clear explanations of the architecture, testing instructions, and configuration guidance.


Code Quality Issues

1. Nginx Configuration - Use of if Directive (Low Priority)

Location: services/nginx/config/default.conf:44-61

The implementation uses nginx if directives for conditional logic. While this works, nginx's if is notoriously tricky. The workaround of concatenating strings to create "BE" is clever and safe in this context. Consider documenting this pattern for future maintainers.

2. Description Truncation Edge Case (Very Low Priority)

Location: backend/src/monarch_py/api/meta.py:74-75

If a description somehow had no spaces, rsplit(" ", 1) would return the full string. This would result in a 303-character description instead of 300. In practice, entity descriptions will always have spaces.

3. TEMPLATES_DIR Error Handling (Very Low Priority)

Location: backend/src/monarch_py/api/meta.py:15-18

If TEMPLATES_DIR doesn't exist, this will fail at module import time. While fail-fast on import is acceptable, a clearer error message could help.


Security Considerations

1. Meta Refresh Open Redirect (Low - Currently Safe)

Location: backend/src/monarch_py/api/templates/meta.html:28

The url variable is constructed from request headers and entity ID. Currently safe because nginx validates the CURIE format (lines 24-26), which prevents arbitrary URLs. However, this is a subtle dependency - if someone later modifies the nginx config to be more permissive, this could become an open redirect vulnerability.

Recommendation: Add a comment in both the nginx config and the Python code noting this security dependency.

2. Rate Limiting (Low Priority - Infrastructure Concern)

The endpoint doesn't have explicit rate limiting. Consider adding rate limiting at the nginx level for /v3/api/meta/* path, or document that this should be handled by infrastructure.


Performance Considerations

Cache Headers: Good choice of 1-hour cache (meta.py:88). Entity data doesn't change frequently.

Solr Query Efficiency: Good use of extra=False to avoid fetching unnecessary association data (meta.py:53).


Testing

The test suite is excellent and covers happy path, 404 handling, XSS protection, description truncation, missing data edge cases, and cache headers. All tests use proper mocking.

Nice to Have: Consider adding tests for get_base_url() with various header combinations and very long entity names.


Recommendations Summary

Must Fix: None

Should Consider:

  1. Add a comment documenting the nginx/Python security dependency for open redirect prevention
  2. Consider rate limiting at the infrastructure level

Nice to Have:

  1. Add startup validation for TEMPLATES_DIR existence
  2. More robust description truncation (handle no-space edge case)
  3. Document nginx if directive usage pattern
  4. Additional test coverage for get_base_url()

Approval

This is high-quality work that follows best practices. The implementation is clean, well-tested, and thoroughly documented. The architecture is sound and the code is maintainable.

Recommendation: Approve with optional consideration of the Should Consider items above.

Great work!

@kevinschaper kevinschaper merged commit 3453d6a into main Feb 10, 2026
17 checks passed
@kevinschaper kevinschaper deleted the issue-1242-link-previews branch February 10, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add dynamic Open Graph meta tags for entity pages (better link previews)

2 participants